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Abstract 

We establish connections between the size of circuits and formulas 
computing monotone Boolean functions and the size of first-order and 
nonrecursive Datalog rewritings for conjunctive queries over OWL 2 QL 
ontologies. We use known lower bounds and separation results from circuit 
complexity to prove similar results for the size of rewritings that do not use 
non-signature constants. For example, we show that, in the worst case, 
positive existential and nonrecursive Datalog rewritings are exponentially 
longer than the original queries; nonrecursive Datalog rewritings are in 
general exponentially more succinct than positive existential rewritings; 
while first-order rewritings can be superpolynomially more succinct than 
positive existential rewritings. 

1 Introduction 

First-order (FO) rewritability is the key concept of ontology-based data ac- 
cess (OBDA) [T51 [H3 HZ]) which is believed to lie at the foundations of the 
next generation of information systems. An ontology language C enjoys FO- 
rewritability if any conjunctive query q over an ontology T, formulated in C, 
can be transformed into an FO-formula q' such that, for any data A, all an- 
swers to q over the knowledge base (T, A) can be found by querying q' over A 
only using a standard relational database management system (RDBMS). On- 
tology languages with this property include the OWL2QL profile of the Web 
Ontology Language OWL 2, which is based on description logics of the DL-Lite 
family [T^l H] , and fragments of Datalog* such as linear or sticky TGDs [TU1 E] . 
Various rewriting techniques have been implemented in the systems QuOnto pQ , 
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REQUIEM [25], Presto [33], Nyaya [TB], QuestQ and IQAROfl 

OBDA via FO-rewritability relies on the empirical fact that RDBMSs are 
usually very efficient in practice. However, this does not mean that they can ef- 
ficiently evaluate any given query: after all, for expression complexity, database 
query answering is PSPACE-complete for FO-queries and NP-complete for con- 
junctive queries (CQs). Indeed, the first 'naive' rewritings of CQs over OWL 2 QL 
ontologies turned out to be too lengthy even for modern RDBMSs 12. 26]. The 
obvious next step was to develop various optimisation techniques [331 [TBI I3T1I32"] ; 
however, they still produced exponential-size - 0((\T\ ■ \q\) lql ) — rewritings 
in the worst case. An alternative two-step combined approach to OBDA with 
OWL 2 EL and OWL2QL, suggested in [Ml EH H3J, first expands the data 
by applying the ontology axioms, introduces new individuals required by the 
ontology, and then uses all this in the rewriting. Yet, even with these extra re- 
sources a simple polynomial rewriting was constructed only for the fragment of 
OWL2QL without role inclusions; the rewriting for the full language remained 
exponential. A breakthrough seemed to come in [17) . which showed that one 
can construct, in polynomial time, a nonrecursive Datalog rewriting for some 
fragments of Datalog* containing OWL 2 QL. However, this rewriting uses the 
built-in predicate ^ and numerical constants that are not present in the original 
query and ontology. Without such additional constants, as shown in [50], no 
FO-rewriting for OWL2QL can be constructed in polynomial time. 

This development brings forward a spectrum of theoretical and practical 
questions that could influence the future of OBDA. What is the worst-case size 
of FO- and nonrecursive Datalog rewritings for CQs over OWL 2 QL ontologies? 
(The question whether there exists a polynomial-size FO-rewriting without ad- 
ditional constants was left open in [2Q].) What is the type/shape/size of rewrit- 
ings we should aim at to make OBDA with OWL2QL efficient? What extra 
means (e.g., built-in predicates and constants) can be used in the rewritings? 

In this paper, we investigate the worst-case size of FO- and nonrecursive Dat- 
alog rewritings for CQs over OWL2QL ontologies depending on the available 
means. We distinguish between 'pure' rewritings, which cannot use constants 
that do not occur in the original query, and 'impure' ones, where such constants 
are allowed. Our results can be summarised as follows: 

- An exponential blow-up is unavoidable for pure positive existential rewrit- 
ings and pure nonrecursive Datalog rewritings. Even pure FO-rewritings 
with = can blow-up superpolynomially unless NP C P/poly. 

- Pure nonrecursive Datalog rewritings are in general exponentially more 
succinct than pure positive existential rewritings. 

- Pure FO-rewritings can be superpolynomially more succinct than pure 
positive existential rewritings. 

1 http: //obda. inf .unibz . it /protege-plugin/quest /quest .html 
2 http: //code . google . com/p/ iqaros/ 
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Impure positive existential rewritings can always be made polynomial, and 
so they are exponentially more succinct than pure rewritings. 

We obtain these results by first establishing connections between pure rewritings 
for conjunctive queries over OWL2QL ontologies and circuits for monotone 
Boolean functions, and then using known lower bounds and separation results 
for the circuit complexity of such functions as CLlQUE(n, k) 'a graph with n 
nodes contains a /c-clique' or Matching (2rc) 'a bipartite graph with n vertices 
in each part has a perfect matching.' 



2 Queries over OWL 2 QL Ontologies 

By a signature, E, we understand in this paper any set of constant symbols 
and predicate symbols (with their arity). Unless explicitly stated otherwise, E 
does not contain any predicates with fixed semantics, such as = or ^. In the 
description logic (or OWL 2 QL) setting, constant symbols are called individual 
names, a», while unary and binary predicate symbols are called concept names, 
Ai, and role names, Pi, respectively, where i > 1. 

The language of OWL 2 QL is built using those names in the following way. 
The roles R, basic concepts B and concepts C of OWL2QL are defined by the 
grammar: 



R ■■■= Pi Pi, (Pi(x,y) | Pi(y,x)) 

B 
C 



= -L I M | 3R, (± | Ai(x) | ByR(x,y)) 

= B | 3R.B, (B(x) | 3y(R(x,y)AB(y))) 



where the formulas on the right give a first-order translation of the OWL2QL 
constructs. An OWL2QL TBox, T, is a finite set of inclusions of the form 

B\ZC, (Vx(B(x)^C(x))) 

Ri\=R 2 , (Vx,y(R 1 (x,y)^R 2 (x,y))) 

B 1 r\B 2 Q±, (M-Bi(x) AB 2 (x) -> JL)) 

Ri n i? 2 Q JL. (Vx,y(Ri(x,y) A R 2 (x,y) -»■ JL)) 

Note that concepts of the form 3R.B can only occur in the right-hand side of 
concept inclusions in OWL 2 QL. An ABox, A, is a finite set of assertions of the 
form Ak(ai) and Pk(a,i,aj). T and A together form the knowledge base (KB) 
K = (T, A) ■ The semantics for OWL 2 QL is defined in the usual way [6] , based 
on interpretations X = (A 1 , - x ) with domain A 1 and interpretation function - x . 

The set of individual names in an ABox A will be denoted by ind(^l). For 
concepts or roles E\ and E%, we write E\ Cy Eq, \IT \= E\ C £?2! and we set 
[J5] = {S' I E C r £" and E' C r 

A conjunctive query (CQ) g(x) is a first-order formula By (p(x,y), where <p is 
constructed, using A, from atoms of the form Ak(ti) and Pk{t\,t2), where each 
ti is a term (an individual or a variable from x or y). A tuple a C ind(^l) is a 
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certain answer to q(x) over JC = (T,A) if I \= q{a) for all models I of JC; in 
this case we write JC \= q(a). 

Query answering over OWL2QL KBs is based on the fact that, for any 
consistent KB JC = (T,A), there is an interpretation C/c such that, for all CQs 
q{x) and a C \nd{A), we have JC (= q(a) iff Cjc |= q{a). The interpretation Ck:, 
called the canonical model of JC, can be constructed as follows. For each pair 
[R], [B] with 3R.B in T (note that 3R.T is just another way of writing 3R), 
we introduce a fresh symbol w^rb] an d call it the witness for 3R.B. We write 
JC \= C(u>fagi) if 3i?~ C.J- C or i? C7- C. Define a generating relation, ^->, on 
the set of these witnesses together with ind(_4) by taking: 

— a lOfijs] if a G ind(*4), [i?] and [B] are C^-minimal such that JC \= 3R.B(a) 

and there is no b G ind(.4) with /C |= i?(a, 6) A £?(&); 

— ^[ij'S'l w\rb] if u ~> W\RtB'], for some u, [R] and [£?] are Cj--minimal 

such that JC |= Bii.B^^/B']) and it is not the case that R' Qj- R~ and 
K h 

If a ~» Wffl^Bj] ~» w [fl„s„]i n > 0, then we say that a generates the path 

au; [iiiBi] ' ' ' w [R„B n ]- Denote by path A; (a) the set of paths generated by a, and 
by tail(7r) the last element in it G path^(a). C/c is defined by taking: 

A Ck = (J path JC (a), a CK = a, for a £ ind(.A), 

aGind(^) 

A Ck = {7r G A c,c I K |= A(tail(7r))}, 

P c ^ = {( a , 6) g ind(^) x \nd(A) \ JC |= P(a,6)} U 

{(7r,7T • iu[hb]) I tail(7r) ~> u;[rb], flC r P}U 
{(tt • w [i?B] ,7r) I tail(Tr) ~> w [i?B] , R C r P~}- 

The following result is standard: 

Theorem 1 ([H|22]). For every OWL2QL KB K = (T, A), every CQ q(x) 
and every a C ind(_4), JC \= q(a) iff C/c \= q{a). 

3 Query Rewriting 

Let E be a signature that can be used to formulate queries and ABoxes (re- 
member that E does not contain any built-in predicates). We denote the set 
of individuals in E by ind(E). Given an ABox A over E, define X4 to be the 
interpretation with domain ind(E) such that, for any predicate E(x) in E and 
any a C ind(E), we have I4 |= E(a) iff E(a) g A. 

Given a CQ q{x) and an OWL 2 QL TBox T, a first-order formula q'(x) over 
E is called an FO-rewriting for q(x) and T if, for any ABox A over E and any 
a C ind(.A), we have (T, A) (= q(a) iff |= q'(a). If q' is an FO-rewriting of 
the form 3yip(x,y), where tp is built from atoms using only A and V, then we 
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call q'(x) a positive existential rewriting for q(x) and T (or a PE-rewriting, for 
short). The size \q'\ of q' is the number of symbols in q' . 

All known FO-rewritings for CQs and OWL2QL ontologies are of exponen- 
tial size in the worst case. More precisely, for any CQ q and any OWL 2 QL TBox 
T, one can construct a PE-rewriting of size 0((\T\ ■ \q\) lql ) El [13 El El US • 
One of the main results of this paper is that this lower bound cannot be substan- 
tially improved in general. On the other hand, we shall see that FO-rewritings 
can be superpolynomially more succinct than pure PE-rewritings. 

We shall also consider query rewritings in the form of nonrecursive Datalog 
queries. We remind the reader (for details see, e.g., [13]) that a Datalog program, 
n, is a finite set of Horn clauses 

A---AA m -+A ), 

where each Ai is an atom of the form P(ii, . . . ,t{) and each tj is either a variable 
from i or a constant. Aq is called the head of the clause, and A\,... ,A m its 
body. All variables occurring in the head A must also occur in the body, i.e., 
in one of the A,-, 1 < i < m. A predicate P depends on a predicate Q in II 
if II contains a clause whose head's predicate is P and whose body contains 
an atom with predicate Q. A Datalog program II is called nonrecursive if this 
dependence relation for II is acyclic. A nonrecursive Datalog query consists of a 
nonrecursive Datalog program II and a goal G, which is just a predicate. Given 
an ABox A, a tuple a C ind(^4) is called a certain answer to (II, G) over A if 
II, A \= G(a). The size \H\ of II is the number of symbols in II. 

We distinguish between pure and impure nonrecursive Datalog queries [7]. 
In a pure query (II, G), the clauses in II do not contain constant symbols in 
their heads. One reason for considering only pure queries in the OBDA setting 
is that impure ones can have too much impact on the data. For example, an 
impure query can explicitly add a ground atom Aq(o) to the database, which 
has nothing to do with the intensional knowledge in the background ontologies. 
In fact, impure nonrecursive Datalog queries are known to be more succinct 
than pure ones. 

Given a CQ q(x) and an OWL2QL TBox T, a pure nonrecursive Dat- 
alog query (n, G) is called a nonrecursive Datalog rewriting for q(x) and T 
(or an NDL-rewriting, for short) if, for any ABox A and any a C ind(„4), we 
have (T,A) \= q(a) iff II, ^4 |= G(a). Similarly to FO-rewritings, known NDL- 
rewritings are of exponential size [33l EE]- Here we show that, in general, one 
cannot make NDL-rewritings shorter. On the other hand, NDL-rewritings can 
be exponentially more succinct than PE-rewritings. 

The rewritings can be much shorter if non-signature predicates and constants 
become available. As follows from [T7] , every CQ over an OWL 2 QL ontology 
can be rewritten as a polynomial-size nonrecursive Datalog query if we can use 
the inequality predicate and at least two distinct constants (cf. also [5] which 
shows how two constants and = can be used to eliminate definitions from first- 
order theories without an exponential blow-up). In fact, we observe that, using 
equality and two distinct constants, any CQ over an OWL2QL ontology can 
be rewritten into a PE-query of polynomial size. 
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4 Boolean Circuits 



In this section, we give a brief introduction to Boolean circuits and obtain some 
results that will be used in what follows. 

An n-ary Boolean function, for n > 1, is a function /: {0, l} n — > {0, 1}. A 
family of Boolean functions is a sequence n > 1, where each /" is an n-ary 
Boolean function. We will deliberately abuse notation and employ the same 
symbol such as / to denote both a family of Boolean functions and a particular 
function in the family. For example, by saying that a function / is in the class 
CONP we mean that there exist a polynomial p: N — > N and a Boolean function 
g, computable in polynomial time, such that, for n > 1 and x G {0, 1}™, we have 
f(x) = iff g(x, y) = 0, for some y G {0, (think of / as the characteristic 

function of a language which is in CONP). The variables y in g are called advice 
variables. 

We remind the reader (for more details see, e.g., [3j [19]) that, for every 
n > 1, an n-input Boolean circuit, C, is a directed acyclic graph with n sources 
(inputs) and one sink (output). Every non-source node of C is called a gate; it 
is labelled with either A or V, in which case it has two incoming edges, or with 
-i, in which case it has one incoming edge. A circuit is monotone if it contains 
only A and V gates. The number of nodes in C will be denoted by |C|. We 
think of a Boolean formula as a circuit in which every gate has at most one 
outgoing edge. If x G {0, 1}", then C(x) is the output of C on input x. We say 
that C computes a Boolean function / if C(a?) = f(x), for every x G {0, 1}™. 

Given a function T: N — > N, a T-size family of circuits is a sequence C n , for 
n > 1, of n-input Boolean circuits of size |C™ < T(n). As with Boolean func- 
tions, we abuse notation and employ C to denote a family of Boolean circuits. 
The class of languages that are decidable by families of polynomial-size circuits 
is denoted by P/poly. 

We shall use three well-known families of monotone Boolean functions in 
NP: 

CLiQUE(n, k) is the function of n(n — l)/2 variables e^, 1 < i < j < n, which 
returns 1 iff the graph with vertices {1, . . . , n} and edges {{i, j} \ eij = 1} 
contains a fc-clique. A series of papers, started by Razborov's break- 
through [3D] , gave an exponential lower bound for the size of monotone cir- 
cuits computing CLiQUE(n, k): 2 n (^*) for all k < |(n/logn) 2 / 3 (see [2]). 
For monotone formulas, an even better lower bound was obtained in [29] : 
2°( fe ) for k — 2n/3. Since CLiQUE(n, k) is NP-complete, the question 
whether CLiQUE(n, k) can be computed by a polynomial-size Boolean cir- 
cuit is equivalent to the open NP C P/poly problem. 

Matching (2n) is the function of n 2 variables e^, 1 < i,j < n defined as 
follows. The variables give a bipartite graph G with n vertices in each 
part: it contains an edge {i,j} iff = 1. Matching (2n) returns 1 iff 
there is a perfect matching in G, that is, a subset E of edges in G such 
that every node in G occurs exactly once in E. An exponential lower 
bound for the size of monotone formulas computing Matching (2n) was 
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obtained in [23]: 2 n{ - n \ However, non-monotone formulas computing this 
function are of size n ' 108 "' [9]. 

Gen(?t, 3 ) is the function of n 3 variables x^k, 1 < k < n, defined as follows. 
We say that 1 generates k < n if either k = 1 or, for some i and j such 
that Xijk = 1, 1 generates both i and j. Gen(xih, . . . , x nnn ) returns 
1 iff 1 generates n. Gen(tt, 3 ) is clearly a monotone Boolean function 
computable by polynomial-size monotone Boolean circuits. On the other 
hand, any monotone formula computing Gen(ti 3 ) is of size 2™ , for some 
e = const [28] . 

The complexity results above will be used in Section [7] to obtain similar bounds 
on the size of rewritings for certain CQs and OWL2QL ontologies encoding 
these three function. The encoding will require a representation of these func- 
tions in terms of CNF. 

More specifically, for each of these functions — their duals, to be more 
precise, which are in CONP — we construct an unsatisfiable CNF such that its 
clauses correspond to the variables of the function, and the function returns 1 
on an input x iff the CNF is unsatisfiable even if we remove all of its clauses 
corresponding to those variables that are in x. The construction is similar to 
the classic proof of the Cook-Levin theorem stating that satisfiability of Boolean 
formulas is NP-complete, see for example [25l [3], 

Let / be a monotone Boolean function in CONP with /(l) = 1 (that is, 
/ 0). Fix a Boolean circuit C of polynomial size with inputs x, y and a 
polynomial p such that f(x) = iff C(x, y) = 0, for some y G {0, lpKl^l); such a 
circuit exists since P C P/poly (see the definition of CONP functions above). 

Let us also fix an enumeration Qi, . . . , g; of the gates in C such that whenever 
the output of Qi is an input of Qj then i < j (thus, is the sink of C). 

Now we associate with / and C a CNF tpf with the prepositional variables 
x= (xi, . . . ,x„), y € (yi, . . . ,y m ) and g = (gi,...,g;) (the variables g correspond 
to the gates Qi in the enumeration above) and the following clauses: 

(cl) Xi, for 1 < i < n, 

(c2) -g ; , 

(c3) h V gi, V -.gi, for g 4 = -.f), 

(c4) hi V -.g,, h 2 V ^gi and -.hi V -.h 2 V g t , for 0; = ^A h 2 , 

(c5) -.hiVg t , ^h 2 Vgi and hi V h 2 V ->gj, for Qi = hi V h 2 . 

Clauses (c3)~(c5) encode the correct computation of the circuit C. For exam- 
ple, the conjunction of the clauses of type (c4), for some Qi, is equivalent to the 
formula gi «-» hi A h 2 . More specifically, the following holds: 

Claim 2. Consider the CNF consisting of clauses (c3)-(c5), and let (x,y,g) 
be some satisfying assignment for this CNF. Then, for any node h of C, we 
have 

t)(x,y) = h, 
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where t) (x, y) denotes the value of () in C on the inputs x and y and h the 
element of (x,y) for the node f). 

Proof. The claim is proved by induction on the node number. The basis of 
induction (inputs) is by definition. Consider three cases depending on the type 
of gate g: 

1. if g = -if), we have g(x, y) — ->t){x 1 y) = -<h = g, where the second equality 
is by the induction hypothesis, and the third by clauses (c3); 

2. if q = f)i A f)2, we have g(x, y) — 1 iff fh (#, y) = 1 and ^{x, y) = 1; by the 
induction hypothesis, this is equivalent to h\ = 1 and h% = 1, which, by 
clauses (c4), is equivalent to g = 1; 

3. if g = f)i V f)2, we have g(i?, y) = iff t)i{x, y) = and f)2(^, ?7) = 0; by the 
induction hypothesis, this is equivalent to h\ = and ft-2 = 0, which, by 
clauses (c5), is equivalent to g = 0. 

□ 

Suppose yj/ = Ai=i Di. We assume that the clauses Di, . . . , D n are of the 
form (cl) (that is, Di = x^), while the remaining -D n +i, . . . , Dd are of the form 
(c2)-(c5). Let a = (ai, . . . ,a n ) G {0,1}". Denote by ff(a) the formula 
obtained from by removing those clauses Di for which the corresponding a,; 
in a are equal to 0; that is, set 

d 

Vs {a) = f\ Di A /\ Dj. (1) 

Oii — l j—n+1 

Lemma 3. iff (a) is satisfiable iff f(a) = 0. 

Proof. (^=) Suppose f(a) — 0. This means that C(a, y) = 0, for some y. We 
show that <Pf(cx) is satisfiable. Define an assignment to the variables in <^/(a) 
as follows: the variables in x and y take the values a and y, while the values for 
the variables gi £ g are given by the outputs of the corresponding gates gi in 
C(a,y). By Claim [2 this assignment makes all the clauses of <pf (a) true. 

(=>) Conversely, suppose ff{a) is satisfiable under some assignment (x,y,g) 
of truth- values to the prepositional variables (x, y, g). By (c2), gi = 0; by (cl), 
a < x. As / is monotone, it is enough to prove that f(x) — 0. By Claim [21 the 
values of the variables g are equal to the outputs of the corresponding gates of 
the circuit C on the input (x,y). Thus, the output value of C(x, y) is g; = 0. 
It follows that f(x) = 0. □ 

Thus, for any monotone Boolean function in CONP, we constructed a polyno- 
mial-size CNF representing it in the sense of Lemma [H To make our upper 
bound results in Section [7] more precise, we require sharper estimations of the 
size of these CNFs for the functions CLiQUE(n, k) and Matching (2n). 

It is not hard to see that CLiQUE(n, k) can be computed by a nondetermin- 
istic circuit C (with advice variables) of size 0(n 2 ). Indeed, this circuit gets the 
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edges of the graph as x G {0, l}^™ -1 )/ 2 and the vector y G {0, 1}™ indicating 
those vertices of the graph that form the desired clique. The circuit C checks 
whether there are /c-many l's in y, which requires 0(n log n) operations, and 
whether any two vertices given by l's in y are connected by an edge in x, which 
requires 0(n 2 ) additional operations. Finally, C takes the conjunction of the 
results of all these checks, which takes 0(n 2 ) more operations. Thus, the CNF 
corresponding to this circuit will have 0(n 2 ) clauses and 0{n 2 ) variables. 

Matching (2n) can also be computed by a Boolean nondeterministic circuit 
of size 0(n 2 ). This circuit gets the edges of the bipartite graph as x and the 
edges of the desired perfect matching as y. For each vertex, the circuit has 
to check whether there is exactly one edge in y containing it, which requires 
0(n 2 ) operations. Also, the circuit has to check, for each pair of vertices, that 
whenever there is an edge between them in y then there is an edge between 
them in x. This takes 0(n 2 ) additional operations. Finally, the circuit takes 
the conjunction of these two checks. The CNF corresponding to this circuit will 
have 0(n 2 ) clauses and 0(n 2 ) variables. 

5 CNFs and OBDA 

In the previous section, given a family of monotone Boolean functions /, which 
is in CONP and for which /(l) = 1, we defined a family of CNFs <p/. Now we 
use iff to construct a family of OWL2QL TBoxes 7/ and CQs q^. 

Let (pj — /\ i=1 Di and let pi, for 1 < i < N, be all the propositional variables 
in ipf. Recall that the clauses Z?i, . . . , D n are of the form (cl) (that is, Di = x^, 
while the remaining D n+ \ , . . . , Dj are of the form (c2)-(c5). Consider the 
acyclic OWL2QL TBox 7} with the following axioms, where 1 < i < N, 
1 < j < d and I = 0, 1: 

Ai-xnBP-.Xf, 
Xi E Ai, 

XfQdj if — G Dj, 
X} E C itj if Pt G Dj, 

A a n Ai C _L, 
An n BP C _L, 

AonC^CT, if(i,j)^{(0,l),...,(0,n)}. 
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It is not hard to check that |7/| = 0(N ■ d). Consider also the CQ 



q f = 3y3z 



N 



Mvo) A /\P(y u yi-i) A 

i=l 
d 

/\ P( Z i,j' z i-i,j) A Coj( z Oj)) 



iV-1 



i=l 



where y = (t/ , ■ • • , J/jv) and i* = (z 0A , . . . , 0jv-i,i, • • • , 2o,d, • • • , zw-i.d)- Clearly, 
<7j: is of the size 0(N ■ d). The canonical model C(Tf,{A (a)}) °f (7?'{^o(«)}) 
and the query are illustrated in Fig. [TJ where an arrow from a point it to a 
point v means that (u, v) € P. 



(7>,{A (a)}) 



9/ 




X, ,4 3 ,C 3 ,2 



Figure 1: Canonical model C(T/,{A (a)}) an d query q/. 
We are interested in the ABoxes A such that 

{A {a)} C A C {A (a),C 0A (a),...,C , n (a)} 

(in particular, ind(^l) = {a}). We call such ABoxes f- suitable. Given an /- 
suitable ABox A, denote by T>a the set of clauses {Di \ Co,i{a) £ A}. We 
also define aj, — (x\, . . . , x n ) € {0, l} n by taking x, = iff £ (Recall 
that Di — Xi, 1 < i < n.) Thus, iff{ajCj is the CNF that results from iff by 
removing all the clauses that occur in T>a- 

Lemma 4. The following hold for the signature of 7/ with a single constant: 
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(i) Suppose that q' is an FO-sentence such that (7), A) \= qt iff I a \= 
for any f -suitable A. X7ier[f| 



3;r 



A (x)A(q'W V B{x)) 



is an FO-rewriting for q^ and Tf with \q"\ = \q'\ + 0(N ■ d). 

(ii) Suppose (II, G) is a pure nonrecursive Datalog program with a proposi- 
tional goal G such that, (Tf,A) |= q* iff II, A |= G, for any f -suitable A. Then 
(II',G") is an NDL-rewriting for qj and Tf with |II'| = |II| + 0(N ■ d), where 
G' is a fresh propositional variable and II' is obtained by extending II with the 
following clauses: 

- Vx (Aq(x) A B(x) — > G'), for all concepts B such that Aq l~l B Qf, L, 

- \/x(A (x)AG^G'). 

Proof, (i) The queries q' and q" give the same answer over any /-suitable ABox. 
Consider a non-/-suitable ABox A' in the signature of Tf with ind(„4') = {a}. 
If A (a) (£ A' then we clearly have both (Tf,A') ^ 9/ and T A > ty= q" ■ If A' 
contains Aq (a) and any ground atom in the signature of Tf that is different from 
Ao(a), Co,i(a), . . . , Co : „(a) then (Tf, A') is inconsistent, and so (Tf,A') \= q f . 
On the other hand, we clearly have Ia 1 \= q" ■ 

(ii) is proved in the same way. The programs (n, G) and (II', G') give the 
same answer over any /-suitable ABox. Consider a non-/-suitablc ABox A! in 
the signature of Tf with ind(^l') = {a}. If Ao(a) ^ A' then we clearly have 
both (Tf, A') ^= <7y and II', -4.' \£ G' . If A' contains Ao(a) and any ground 
atom in the signature of Tf that is different from Aq(o), Go,i(a), . . . , Go, n (ci) 
then (Tf, A') is inconsistent, and so (Tf,A') \= qf. On the other hand, we 
clearly have II', A' \=G'. ' □ 

Remark 5. It is worth noting that the lemma above can be extended to an 
arbitrary signature (that is, to ABoxes with arbitrarily many individuals) pro- 
vided that equality is available in rewritings. We refer to FO-rewritings with = 
as FO = -rewritings. 

(i') Suppose that q' is an FO-sentence such that (Tf,A) \= qf iff Ia N l' • 
for any f -suitable A. Then there is an FO = -rewriting q" for q^ and Tf such 
that \q n \ < \q'\ + p(N ■ d), for some polynomial p. 

(ii') Suppose that (n, G) is a pure nonrecursive Datalog program with a propo- 
sitional goal G such that (Tf, A) \= qf iffH,A\= G, for any f -suitable A. Then 
there is an NDL-rewriting (H' , G') for qf and Tf such that \H'\ < | IT | +p(N -d), 
for some polynomial p. 

The proof uses the polynomial 'impure' PE- and NDL-rewritings of Section[8] 
and [17]. To show (i'), let 7 be the PE-rewriting for q^ and Tf to be given in 
Section [5] We assume that this rewriting uses only two constants, say and 
1. Now, given an FO-sentence q' that is evaluated over ABoxes with a single 



'Here and below, B(x) denotes 3y P(x, y) in the case of B = BP. 
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individual only, we can cleary construct a quantifier- free FO-formula q (x) in the 
signature of q' such that it contains no constants and |= q (a) iff |= q', for 
all ABoxes with a single individual a. Consider now the following FO-sentence 



3x 



A (x)A(q (x) V \/ B(x) V 3i/(P(x,y)A7[0/a:,l/i/]) 



A nBC T/ l 



where 7[0/x, 1/y] is the result of replacing each occurrence of in 7 with x and 
each occurrence of 1 with y. 

Suppose (Tf,A) |= qf- Then either (Tf,A) is inconsistent or A has an 
individual ao such that {7}, A) \= qt (do), where q^(ao) is the query q^ with yo 
replaced by ao- In the former case, by the second disjunct, we have |= q", 
which is a correct positive answer. In the latter case, if there is a distinct a\ 
with P(ai,ao) in A then the rewriting 7 provides the correct positive answer 
and, by the third disjunct, |= q". Finally, if neither of the above cases is 
applicable to ao then A ao = {D(ao) \ D(ao) € A, D is a concept name} is an 
/-suitable ABox, in which case the correct positive answer is given by q (ao). 

Conversely, suppose (T/,A) ^= q$- Then (Tf,A) is consistent, and so, the 
second disjunct is false. If there is no ao with A(ao) £ A then, clearly, Z4 ^ q". 
So, take an arbitrary individual ao such that A(ciq) £ A. If P(a\,ao) G A, for 
some a\ (distinct from ao due to consistency) then, on the one hand, we have 
7-A V= 7[0/ao,l/ai] and so, the third disjunct is false. On the other hand, for 
an /-suitable ABox A ao — {D(ao) \ D(ao) £ A, D is a concept name}, we have 
(Tf, Aa ) H Qf iff x A ao h q' iff Za uo H QqM- It follows that 1 A ^ q (a), for 
all individuals a with ^4o(«) € A, and so, the first disjunct is false as well. 

Claim (ii') is proved in a similar way, using a modification of the polynomial- 
size NDL-rewriting of [T7]. (We note that in the short NDL-rewriting of [T7] 
the inequality predicate ^ is applied only to terms that range over the extra 
constants, and not ABox individuals, and therefore one can write a short pro- 
gram defining 7^ by listing all pairs of non-equal constants.) Let (A, Q(zo, z{)) 
be a nonrecursive Datalog program of the short impure rewriting for q^ and Tf, 
which uses zq and z\ for the constants and 1. Next, given a nonrecursive Dat- 
alog program (n, G) that is evaluated over ABoxes with a single individual only, 
we can construct a new nonrecursive Datalog program (no, Gq(x)) such that all 
predicates of no are unary, all clauses have a single variable and Hq, A \= Gq (a) 
iff n, A \= G, for all ABoxes with a single individual a. Consider now (n', G"), 
where G' is a fresh prepositional variable and n' consists of no, A and the 
following three clauses: 

- Vx (Aq(x) A B(x) G'), for all concepts B with A a n B C T/ _L, 

- Vx(A (x) AG (x) -¥ G"), 

- Vx, y (Ao(x) A P(y, x) A Q(x, y) -> G'). 

Suppose (Tf ,A) h= q/ • Then either (Tf , A) is inconsistent or A has an individual 
ao such that (T/,A) |= q^(ao), where q^(ao) is the query qj with yo replaced 
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by do. In the former case, by the first clause, we have U',A \= G' , which is a 
correct positive answer. In the latter case, if there is a distinct a\ with P{a\, a ) 
in A then the program A provides the correct positive answer and, by the third 
clause, IT',^4 \= G'. Finally, if neither of the above cases is applicable to ao 
then A ao = {D(ao) \ D(ao) G A, D is a concept name} is an /-suitable ABox, 
in which case the correct positive answer is given by n . 

Conversely, suppose (7/, A) Y= q$. Then (Tf,A) is consistent, and so, the 
first clause is not applicable. If there is no a with A(a ) G A then, clearly, 
U',A \£ G' . So, take an arbitrary individual a such that A(a ) G A. If 
P(ai, ao) G A, for some a\ (distinct from ao due to consistency) then, on the one 
hand, A,^4 ^= Q(ao,di) and so, the third clause cannot give a positive answer. 
On the other hand, for A a „ = {D(a ) | D(a n ) G A, D is a concept name}, which 
is an /-suitable ABox, (Tf,A ao ) \= qf iff n, A ao \= G iff U a ,A a „ 1= G (a ). It 
follows that n ,»4 ^ Go (a), for all individuals a with A (a) G A, and so, the 
second clause cannot give a positive answer as well. 

Lemma 6. For every f -suitable ABox A, (Tf,A) |= <?/ ijf(fif(a_A) is satis fiable. 

Proof. (=>) Consider an assignment a of points in the canonical model C of 
(7/ , A) to the bound variables of q f under which it holds true. Observe first that 
there is a sequence tto, . . . , ujv of points in C such that w = a, {ui, «i-i) G P, 
u.i G Ai and a(j/i) = Ui. For each variable pi in iff, we set Pi = 1 if G A/ 
and pi — otherwise. We show that Lpf{aj\) is true under this assignment. 
Take any clause Dj in ipf(a_A) and consider a(zaj) G Cb,j. If a(zoj) = a then 
Coj(a) G P^t, Xj = 0, and so Dj (of type (cl)) does not occur in ipf(a_A)- If 
a ( z o,j) a then some Ui is in dj 7 which means that Dj contains pi if Ui G Xj 
and -ipi if Ui G A°. By the definition of the assignment of truth- values to the 
variables in iff, Dj is true under this assignment. 

(=>) Suppose (yS/(ou) is true under some assignment of truth- values to the 
propositional variables Pi,---,Pn- Recall that the canonical model for (Tf,A) 
contains a path uo,...,Un from a = u n to some un that corresponds to that 
assignment in the following sense: Ui G Xf if pi = and Ui G A/ if = 1. 
We construct an assignment a of points in the canonical model of (Tj 7 A) to 
the variables in qj. in accordance with this valuation. For 1 < i < N, we 
set a(yi) — Ui. For 1 < j < to, we define a(zN-i,j), ■ ■ ■ ,a(zoj) recursively, 
starting from a(zN-ij)- set a(zij) = a(zi + i^)w[ PCiJ ] if a(zi+ij) is in C i+ ij 
and a(zij) — Ui, otherwise (assuming that znj — yjy). It is easy to check that 
qf is true in the canonical model under this assignment. □ 

6 Quantifier Elimination 

Now we show how rewritings for q^ and 7/ can be transformed into Boolean 
circuits computing /. Recall that the dual of f(x\, . . . , x n ) is the Boolean 
function /* defined by 

f*(x 1 , . . . ,x n ) = -i/(->ari, . . . , -!£„). 
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Clearly, /** = /. 



Lemma 7. (i) Suppose that q'j is a PE-rewriting for 7/ and q ^ . Then there is 
a monotone Boolean formula ^pf computing f* and such that |?/>/| < Wfl- 

(ii) Suppose that (11/ , G) is an NDL-rewriting for Tf and q^ . Then there is 
a monotone Boolean circuit C/ computing f* and such that |C/| < |II/|. 

(iii) Suppose that q'j is an FO-rewriting for 7/ and q / and that the signature 
E contains a single constant. Then there is a Boolean formula ipf computing 
f* and such that \tpf\ < |<Jf/|. 

Proof, (i) By Lemmas [3] and [6l for any PE-rewriting q'j for q j and 7/ and any 
/-suitable ABox A, we have 

Ia h q' f iff fM = o. 

Recall that, of all ground atoms in signature E, only Ao(a) and the Coj (a), for 
1 < j < n, can be true in I A . In particular, no predicate can be true in T A on 
an element different from a. So we can replace all the individual variables in q'^ 
with a, remove all the existential quantifiers and replace Aq(o) with T and all the 
atoms different from Aq(o) and Coj(a), for 1 < j < n, with _L without affecting 
the truth- value of q'j in I a- Denote the resulting quantifier- free Boolean query 

by qj. Then, for any /-suitable ABox A, we have 

1 A h q) iff f(a A ) = 0. 

The formula q^ is in fact a propositional formula, ipfi with the connectives A, V 
and the propositional variables Coj(a), for 1 < j ; < n, such that T A \= Coj(a) 
iff the j-th component of a A is 0. Thus, ipf is equivalent to /* and, clearly, 

(iii) If, in addition, E contains only one constant, a, then in the same way 
we can convert any FO-rewriting q'j for q^ and 7/ — even with V and -1 - 
to a propositional formula with the variables C'o.j(a), for 1 < j < n, which is 
equivalent to /* (the formulas of the form 3a; xi x ) an( A ^ x xi x ) are replaced 
with x(a)). 

(ii) Suppose now that (II f , G) is an NDL-rewriting for q ^ and 7/ over a given 
signature E, containing a among its constants, and A is an /-suitable ABox. 
Then, for any ground E-atom Q(t±, . . . ,ti) with at least one ti different from 
a, we have 11^,^1 ^= Q(t±, ...,t{) (which can be easily proved by induction of 
the length of derivations using the fact that 11/ is pure and each variable that 
occurs in the head of a clause must also occur in its body). So we can again 
replace all the individual variables in 11/ with a, Aq (a) with T and all the atoms 
that do not occur in the head of a clause and different from Aq(o) and Coj(a), 
for 1 < j < n, with _L. Denote the resulting propositional nonrecursive Datalog 
program by IT^, . We then obtain 

nUhG iff f(a A ) = 0. 
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The program 11^ can now be transformed into a monotone Boolean circuit C / 
computing /*. For every (propositional) variable p occurring in the head of 
a clause in 11^, we introduce a V-gate whose output is p and inputs are the 
bodies of the clauses with head p. And for each such body, we introduce a 
A-gate whose inputs are the propositional variables in the body. The resulting 
monotone Boolean circuit with sources Coj(a), for 1 < j < n, and sink G is 
denoted by C/. Clearly, |C/| < \Ii f \. □ 

We are in a position now to prove our main theorem which connects the size 
of circuits computing monotone Boolean functions with the size of rewritings 
for the corresponding ontologies and queries. 

Theorem 8. For any family f n , n>l, of monotone Boolean functions, which 
is in NP, there exist polynomial- size OWL2QL TBoxes 7~ n and CQs q n such 
that the following hold: 

(1) Let L(n) be a lower bound for the size of monotone Boolean formulas com- 

puting f n . Then, for any PE-rewriting q' n for T n and q n (over any suitable 
signature), \q' n \ > L(n). 

(2) Let L(n) and U(n) be a lower and an upper bounds for the size of monotone 

Boolean circuits computing f n , respectively. Then 

- for any NDL-rewriting (H n ,G n ) for T n and q n (over any suitable 
signature), |±I„| > L(n); 

- there exist a polynomial p and an NDL-rewriting (Tl n ,G n ) for T n 
and q n over any suitable signature with a single constant such that 
|n„| <U(n)+p(n). 

(3) Let L(n) and U(n) be, respectively, a lower and an upper bounds for the size 

of Boolean formulas computing f n . Then 

- for any FO-rewriting q' n for T n and q n over any suitable signature 
with a single constant, \q' n \ > L(n); 

- there exist a polynomial p and an FO-rewriting q' n for 7~ n and q n 
over any suitable signature with a single constant such that \q' n \ < 
U(n) + p(n). 

Proof. (1) follows from Lemma[7](i) for the dual /* of /, which is in CONP. The 
first claim of (2) follows from Lemma [7] (ii). To prove the second claim, take 
any circuit C n computing f n and having size < U(n) and an /*-suitable ABox 
A. By Lemmas Hand O (T n ,A) |= q n iff ^C n (-^a A ) = iff C n (^a A ) = 1 . It 
should be clear that C™ can be transformed into a nonrecursive propositional 
Datalog program (II, G) of size |C"| such that II, A \= G iff (T„, A) \= q n . Then 
we apply Lemma |U (3) is proved analogously. □ 
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7 Rewritings Long and Short 



Now we apply Theorem [8] to the Boolean functions mentioned in Section [4] to 
demonstrate that some queries and ontologies may only have very long rewrit- 
ings, and that rewritings of one type can be exponentially more succinct than 
rewritings of another type. 

First we show that one cannot avoid an exponential blow-up for PE- and 
NDL-rewritings. We also show that even FO-rewritings can blow-up super- 
polynomially for signatures with a single constant under the assumption that 
NP % P/poly. 

Theorem 9. There is a sequence of CQs q n of size 0(n) and OWL 2 QL TBoxes 
T n of size 0(n) such that: 

- any PE-rewriting for q n and T n {over any suitable signature) is of size at 
least 2^» 1/4 ); 

- any NDL-rewriting for q n and T n [over any suitable signature) is of size 
at least 2 "((«/i°s«) 1/12 ) ; 

- there does not exist a polynomial- size FO-rewriting for q n and T n over 
any suitable signature with a single constant unless NP C P/poly. 

Proof. Consider the function / = CLiQUE(m, k) for m = L"- 1 ^ 4 ] • Then the 
number of clauses in the CNF iff is d = 0(m 2 ) and the number of variables 
in it is N = 0(m 2 ). The size of q n and T n constructed in Section [5] is 0(n). 
From Theorem|8]and the lower bounds for CLiQUE(m, k) in Section|4]we obtain 
the lower bound for PE-rewritings if we set k = |_2m/3j = f^n 1 / 4 ) and the 
bound for NDL-rewritings if we set k — L(™/logm) 2 / 3 J = fl((n/ log n) 1 / 6 ). 
If we assume that NP £ P/poly then there is no polynomial-size circuit for 
CLiQUE(m, k), since this function is NP-complete. From this we can deduce 
that there is no polynomial-size .FO-rewriting of q n over any S containing only 
one constant. □ 

Remark 10. By the Karp-Lipton theorem (see, e.g., [3]) NP C P/poly implies 
PH = S|. Thus, in Theorem [9l we can replace the assumption NP % P/poly 
with PH ^ Sf. 

Next we show that NDL-rewritings can be exponentially more succinct than 
PE-rewritings. 

Theorem 11. There is a sequence of CQs q n of size 0(n) and OWL2QL 
TBoxes Tn of size 0{n) for which there exists a polynomial- size NDL-rewriting 
over a signature with a single constant, but any PE-rewriting over this signature 
is of size > 2™ , for some e > 0. 

Proof. Consider the function / = GEN(m) with the number of variables m to be 
fixed later. By Theorem [8j there are a query q n and a theory T n of size at most 
p{m), for some polynomial p. Choose m in such a way that p(m) — n (thus m = 
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Q(n s ) for a positive constant S). Using the bounds on the circuit complexity 
of Gen stated in Section 21 we obtain a nonrecursive Datalog rewriting of q n 
and T n of size poly(n), but any PE- rewriting of q n and T n has size at least 

2™ e > 2™ e for positive constants e and e'. □ 

FO-rewritings can also be substantially shorter than the PE-rewritings: 

Theorem 12. There is a sequence of CQs q n of size 0(n) and OWL2QL 
TBoxes T n of size 0(n) which has an FO-rewriting of size n°^° gn ^ over a 
signature with a single constant, but any PE-rewriting over this signature is 
of size > 2°(" 1/4 ). 

Proof. Consider the function / = Matching (2m) with m = [n 1 / 4 ]. Then the 
number of clauses in the CNF ipf is d — 0(m 2 ) and the number of variables in 
it is N = 0(m 2 ). The size of both q n and T n , constructed as in Section is 
0(n). By Theorem |5] and the lower bounds for Matching (2m) in Section [H 
we obtain the required lower bound for PE-rewritings and upper bound for 
NDL-rewritings (note that ( n i/4)iog« 1 / 4 = n o(logn))_ n 

In fact, we can use a standard trick from the circuit complexity theory 
to show that FO-rewritings can be superpolynomially more succinct than PE- 
rewritings. 

Theorem 13. There is a sequence of CQs q n of size n and OWL2QL TBoxes 
T n of size 0(n) which has a polynomial- size FO-rewriting over a signature with a 

log 1 / 2 n 

single constant, but any PE-rewriting over this signature is of size > 2 fi ( 2 s ' . 

Proof. Consider the function / from the proof of Theorcm ll2l on m = 
variables and add to it L"- 1 ^ 4 ] — m new dummy variables. Then again the length 
of both q n and T n , constructed as in Section^ is 0(n). But now Theorem|S]and 
the lower bounds for the circuit complexity of Matching give the m ^ 08 ™- 1 — 

n °W upper bound on the size of NDL-rewritings and the 2°( m ) = 2 n ( 2 ° s 
lower bound for PE-rewritings. □ 

8 Short Impure Rewritings 

In the proof of Theorem [HI we used the CQs q n — q^,, for / = CLiQUE(n, k) 
and k = (^(n 1 / 4 ), containing no constant symbols. It follows that the theorem 
will still hold if we allow the built-in predicates = and ^ in the rewritings, 
but disallow the use of constants that do not occur in the original query. The 
situation changes drastically if =, =/= and two additional constants, say and 
1, are allowed in the rewritings. As shown by Gottlob and Schwentick [17], in 
this case there is a polynomial-size NDL-rewriting for any CQ and OWL 2 QL 
TBox. Roughly, the rewriting uses the extra expressive resources to encode in 
a succinct way the part of the canonical model that is relevant to answering 
the given query. We call rewritings of this kind impure (indicating thereby that 
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they use predicates and constants that do not occur in the original query and 
ontology). In fact, using the ideas of [5] and |17) . one can construct an impure 
polynomial-size PE-rewriting for any CQ and OWL 2 QL TBox: 

Theorem 14. For every CQ q and every OWL2QL TBox T, there is an 
impure PE-rewriting q' whose size is polynomial in \q\ and \T\. 

Proof. We illustrate the idea of the proof for a larger ontology language of tuple- 
generating dependencies (TGDs). CQ answering under TGDs is undecidable in 
general |8 . However, certain classes of TGDs (linear, sticky, etc. PHHH]) enjoy 
the so-called polynomial witness property (PWP) [17] ■ which guarantees that, 
for each CQ q and each set T of TGDs from the class, there is a number N 
polynomial in \q\ and \T\ such that, for each database A, there is a sequence of 
N chase steps that entail q. OWL2QL has PWP because its concept and role 
inclusions are special cases of linear TGDs. 

So, suppose we have a set T of TGDs from a class enjoying PWP. Without 
loss of generality we may assume that all predicates are of arity L and that all 
TGDs have precisely m atoms in the body, i.e., the TGDs are formulas of the 
form 

vf (Pi(fi) a • • • a p m (t m ) -> a?iM*o)) ' 

where each vector ti,...,t m consists of L (not necessarily distinct) variables 
from x (they are universally quantified) and each of the L variables of to either 
coincides with one of the x (in which case it is universally quantified) or is 
taken from z (in which case it is existentially quantified). Consider a Boolean 
CQ (without free variables) 

\i\ 

Q = 3y f\ RkiVki, ■ ■ .,VkL)- 

k=l 

By PWP, there is a number N polynomial in \T\ and \q\ such that, for any 
ABox A, the query q is true on the first N atoms of the chase for T and A 
(provided that (T,A) (=(?)• In essence, our PE-rewriting guesses these first N 
ground atoms t\ , ■ ■ . , Tzv of the chase for (T, A) and then checks whether the 
guess is a positive answer to q and the atoms indeed form the steps of the chase 
for (T, A). For each chase step 1 < i < N, we will need the following variables: 

- Un, . . . ,UiL are the arguments of the ground atom Tj and range over the 
ABox domain and the labelled nulls nulkj, for 1 < j < L (all these labelled 
nulls can be thought of as natural numbers not exceeding N ■ L); 

- Ti is the number of the predicate of Tj (each predicate name P is given a 
unique number, also denoted by P); so, with un, . . . ,UiL encode Tf, 

- Wn, . . . , von, where I is the maximum length of the x in TGDs, are the 
arguments of the body of the TGD that generated t%\ they also range over 
the ABox domain and the labelled nulls (clearly, I does not exceed m-L). 
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The PE-rewriting is then defined by taking 

\q\ N L N 

q' = 3y3u3r3w ( f\ \f [(r, = R k ) A f\ ( Uij = y kj )] A f\\/ *<) . 

k— 1 2—1 J= 1 z— 1 

The first conjunct of the rewriting chooses, for each atom in the query, one of 
the ground atoms ri , . . . , tjv in such a way that its predicate coincides with the 
query atom's predicate and the arguments match. The second conjunct chooses, 
for each ground atom n, . . . , rjv, the number of a TGD that produces it or 0, if 
the atom is taken from the ABox. So, the set of formulas $j contains 

\/ ((r i = P)AP(u i i,...,u iL )) 

P is a predicate 

for the case when n is taken from the ABox (rj is such that P(xn, . . . , Xji,) is 
in the ABox for the predicate P with the number rj) and the following disjunct, 
for each TGD 

Vx (Pi (tn, • • -Ml) A • • • A P m (i m i, . . .,t mL ) -> 3zP (t 01 , . . . ,t 0L )) 
in T, modelling the corresponding chase rule application: 

(n = Po) a a (u y = a a (u^ = a 

toj — %i toj existential 

m i—1 

A V(( r ^ p ^) A A (^« = «i'i))- 

fe=l i'=l t ki =x l 

Informally, if r, is generated by an application of the TGD above, then rj is the 
number of the head predicate Po and the existential variables Uij of the head 
get unique null values nulkj (third conjunct). Then, for each of the m atoms of 
the body, one can choose a number i' that is less than i such that the predicate 
of Ti> is the same as the predicate of the body atom and their arguments match 
(the last two conjuncts). The variables wu ensure that the same universally 
quantified variable gets the same value in different body atoms and in the head 
(if it occurs there, see the second conjunct). 

It can be verified that \q'\ = 0(\q\ ■ \T\ ■ N 2 ■ L) and that (T, A) \= q iff q' is 
true in the model I4 extended with constants 1, . . . , N ■ L (these constants are 
distinct and do not belong to the interpretation of any predicates but =). 

It should be noted that one can replace the numbers in the rewriting with 
just two constants and 1 (again, with only = interpreted over them). Each of 
the variables mj can be replaced with a tuple Xij , xf - , . . . , x?- of variables with 
p = \\og(N ■ LJ] such that Uij ranges over the ABox elements and u^-, . . . , 
range over {0, 1} and thus represent a number up to N ■ L. Similarly, we replace 
the wu and rj. Each labelled null nulkj is then replaced by the constant tuple 
representing the number (i — 1) • L + j — 1 in binary; the constants P for 
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the numbers of predicates P are dealt with similarly. Finally, the equality 
atoms in the rewriting are replaced by the component-wise equalities and each 
P(ua, . . .,u iL ) is replaced by P(u a , . . .,U iL ) A /\ L j=1 /\ p (u^ = 0). □ 

Thus, we obtain the following: 

Theorem 15. Impure PE- and NDL-rewritings for CQs and OWL2QL on- 
tologies are exponentially more succinct than pure PE- and NDL-rewritings. 

9 Conclusion 

The exponential lower bounds for the size of 'pure' rewritings above may look 
discouraging in the OBDA context. It is to be noted, however, that the ontolo- 
gies and queries used their proofs are extremely 'artificial' and never occur in 
practice (see the analysis in [21!). As demonstrated by the existing description 
logic reasoners (such as Fact++, HermiT, Pellet, Racer), real-world ontologies 
can be classified efficiently despite the high worst-case complexity of the clas- 
sification problem. We believe that practical query answering over OWL 2 QL 
ontologies can be feasible if supported by suitable optimisation and indexing 
techniques. It also remains to be seen whether polynomial impure rewritings 
can be used in practice. 

We conclude the paper by mentioning two open problems. Our exponential 
lower bounds were proved for a sequence of pairs (q n , T n ). It is unclear whether 
these bounds hold uniformly for all q n over the same T: 

Question 16. Do there exist an OWL 2 QL TBox T and CQs q n such that any 
pure PE- or NDL-rewritings for q n and T are of exponential size ? 

As we saw, both FO- and NDL-rewritings are more succinct than PE- 
rewritings. 

Question 17. What is the relation between the size of FO- and NDL-rewritings? 
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